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Abstract  —  A  novel  method  of  constructing  a  joint 
PDF  under  Hi,  when  the  joint  PDF  under  Ho  is 
known,  is  developed.  It  has  direct  application  in  dis¬ 
tributed  detection  systems.  The  construction  is  based 
on  the  exponential  family  and  it  is  shown  that  asymp¬ 
totically  the  constructed  PDF  is  optimal.  The  general¬ 
ized  likelihood  ratio  test  ( GLRT)  is  derived  based  on  this 
method  for  the  partially  observed  linear  model.  Interest¬ 
ingly,  the  test  statistic  is  equivalent  to  the  clairvoyant 
GLRT,  which  uses  the  true  PDF  under  H\,  even  if  the 
noise  is  non- Gaussian. 
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1  Introduction 

Data  fusion  or  sensor  fusion  in  distributed  detection 
systems  has  been  widely  studied  over  the  years.  By 
combining  the  data  from  different  sensors,  better  per¬ 
formance  can  be  expected  than  using  a  single  sensor 
alone.  The  optimal  detection  performance  can  be  ob¬ 
tained  if  the  joint  probability  density  function  (PDF) 
of  the  measurements  from  different  sensors  under  each 
hypothesis  is  completely  known.  However  in  practice, 
this  joint  PDF  is  usually  not  available.  So  a  key  is¬ 
sue  in  this  area  is  how  to  construct  the  joint  PDF  of 
the  measurements  from  different  sensors.  One  common 
approach  is  to  assume  that  the  measurements  are  inde¬ 
pendent  [1],  [2].  This  approach  has  been  widely  used 
due  to  its  simplicity,  since  the  joint  PDF  is  then  the 
product  of  the  marginal  PDFs.  This  leads  to  the  prod¬ 
uct  rule  in  combining  classifiers,  and  it  is  effectively 
a  severe  rule  as  stated  in  [3]  that  “it  is  sufficient  for 
a  single  recognition  engine  to  inhibit  a  particular  in¬ 
terpretation  by  outputting  a  close  to  zero  probability 
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for  it”.  Moreover,  the  independence  is  a  strong  as¬ 
sumption  and  the  measurements  can  be  correlated  in 
many  cases.  The  dependence  between  measurements 
has  been  considered  in  [4,  5,  6].  A  copula  based  frame¬ 
work  is  used  in  [4,  5]  to  estimate  the  joint  PDF  from  the 
marginal  PDFs.  The  exponentially  embedded  families 
(EEFs)  are  proposed  in  [6]  to  asymptotically  minimize 
the  Kullback-Leibler  (KL)  divergence  between  the  true 
PDF  and  the  estimated  one. 

Note  that  all  the  above  methods  are  based  on  the 
assumption  that  we  know  the  marginal  PDFs  of  the 
measurements.  But  in  many  cases,  the  marginal  PDFs 
may  not  be  available  or  accurate.  This  could  happen 
when  we  do  not  have  enough  training  data.  In  this  pa¬ 
per,  we  will  present  a  new  way  of  constructing  a  joint 
PDF  without  the  knowledge  of  marginal  PDFs  but  only 
a  reference  PDF.  The  constructed  joint  PDF  takes  the 
form  of  the  exponential  family  and  the  maximum  like¬ 
lihood  estimate  (MLE)  of  the  unknown  parameters  can 
be  easily  solved  based  on  the  exponential  family.  Since 
there  is  no  Gaussian  distribution  assumption  on  the 
reference  PDF,  this  method  can  be  very  useful  when 
the  underlying  distributions  are  non-Gaussian.  In  the 
examples  when  we  apply  this  method  to  the  detection 
problem,  under  some  conditions,  the  detection  statis¬ 
tics  can  be  shown  to  be  the  same  as  the  the  clairvoyant 
generalized  likelihood  ratio  test  (GLRT),  which  is  the 
test  when  the  true  PDF  under  Hi  is  known  except  for 
the  usual  unknown  parameters. 

The  paper  is  organized  as  follows.  Section  2  formu¬ 
lates  the  detection  problem.  The  construction  of  the 
joint  PDF  is  presented  and  is  applied  to  the  detection 
problem  in  Section  3.  The  KL  divergence  between  the 
true  PDF  to  the  constructed  PDF  is  examined  in  Sec¬ 
tion  4.  We  give  two  examples  in  Section  5.  In  Section 
6,  some  simulation  results  are  shown.  Conclusions  are 
given  in  Section  7. 
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2  Problem  Statement 

Consider  the  detection  problem  when  we  observe  the 
outputs  of  two  sensors,  T^x)  and  T2(x)  which  are 
transformations  of  the  underlying  samples  x  that  are 
unobservable  (see  Figure  1).  All  the  results  are  valid 
for  any  number  of  sensors.  We  just  choose  two  for 
simplicity.  Assume  that  we  have  enough  training  data 
Ti4(x)’s  and  T2i(x)’s  under  Hq  when  there  is  no  signal 
present.  Hence  we  have  a  good  estimate  of  the  joint 
PDF  of  Ti  and  T2  under  Hq  (see  [7]),  and  thus  we  as¬ 
sume  PTi,t2  (ti,  t2;  Hq)  is  completely  known.  Under  Hi 
when  a  signal  is  present,  we  may  not  have  enough  train¬ 
ing  data  to  estimate  the  joint  PDF  under  Hi.  So  our 
goal  is  to  construct  an  appropriate  PTi,T2(’ti)t2;7Y1) 
and  use  it  for  detection.  Since  PT^Tjti,  t2; Hi)  can¬ 
not  be  uniquely  specified  based  on  PTi,T2(ti,t2;  Ho), 
we  need  the  following  reasonable  assumptions  to  con¬ 
struct  the  joint  PDF. 

1)  Under  Hi  the  signal  is  small  and  thus 
PT1,T2(ti,t2;  Hi)  is  close  to  PT1,T2(ti,t2;'H0). 

2)  PTi,T2(ti,  t2 ;  Hi)  depends  on  signal  parameters 
0  so  that 


/?Tl,T2(tl,t2;.7/o) 

Central 

Processor 

%)OV  J{\1 


PTi,T2(ti,  t2;  Hi)  —  ?>ri.T2(ti,  t2;  9) 


and 

PTi,T2(ti,  t2;  Hq)  =  Pi-i,t2  (ti,  t2;  0) 

Note  that  since  9  represents  signal  amplitudes,  0  /  0 
under  Hi-  Therefore,  the  detection  problem  is 


Figure  1:  Distributed  detection  system  with  two  sensors 

where 


Hq:  9  =  0 

Hi:  0/0 


K{9)  =  In  Eq 


exp 


(  oTdlnpT(t;0) 

\°  - 89 - 


(3) 


3  Construction  of  Joint  PDF  for 
Detection 

To  simplify  the  notation,  let 


T  = 


Ti 

T2 


lnpT(t;  9)  =  lnpT(t;  O)  +  0 


T91npx(t;  9) 

89 


Here  Eq  denotes  the  expected  value  under  Hq. 

Next  we  assume  that  the  sensor  outputs  are  the  score 
functions,  i.e. , 


t  = 


8  In  pT (t ;  9) 

89 


I0=o 


(4) 


so  that  the  joint  PDF  PTi,t2 (ti,  t2;  0)  can  be  written 
as  pT(t;  0).  Since  we  assume  that  ||0||  is  small,  we 
expand  the  log-likelihood  function  using  a  first  order 
Taylor  expansion. 


We  omit  the  o(||0||)  term  but  in  order  for  pt(U0)  to 
be  a  valid  PDF,  we  normalize  the  PDF  to  integrate  to 
one  as 


and  are  sufficient  statistics  for  the  constructed  PDF  un¬ 
der  Hi-  This  will  be  true  if  Pt(U  0)  is  in  the  exponential 
family  with 

PT(t;0)=exp  02  t  —  A"(0)  +  lnpx(t;  0)  (5) 


(6) 


and  Uo(T)  =  0.  This  can  be  easily  verified  since  by 
(5),  we  have 


Id.„+0(ll#ll)  Wl,ere 

/  rp  \ 

(1) 

K{9)  =  In  E0 

exp  (  0tTJ 

pT(t;0) 
=  exp 


(91n_px(t;  0) 
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=  t  - 


8K{9) 
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and 


dim  |  =F(T] 

89  l0=°  ^ 


(2) 


as  well  known  properties  of  the  exponential  family. 
Note  that  even  if  Kq(T)  yf  0,  we  still  have 


t  -  E0(T) 


<91npT(t;0)  | 

do  l0=o 


We  can  use  t  —  E0(T)  instead  of  t  as  the  sensor  out¬ 
puts  and  hence  still  satisfy  (4)  and  (5).  As  a  result,  we 
will  use  (5)  as  our  constructed  PDF.  This  implies  that 
t  is  a  sufficient  statistic  for  the  constructed  exponential 
PDF,  and  hence  this  PDF  incorporates  all  the  sensor 
information.  Note  that  if  Ti,  T2  are  statistically  de¬ 
pendent  under  Ho,  they  will  also  be  dependent  under 
H\.  Also  note  that  only  pr(t;  0)  is  required  in  (5).  It  is 
assumed  in  practice  that  this  can  be  estimated  or  found 
analytically  [7]  with  reasonable  accuracy. 

Since  6  is  unknown,  the  GLRT  is  used  for  detec¬ 
tion  [8].  We  want  to  maximize  px(t;  0)  or  In  p^’-oj  = 

0T t  -  K(0)  over  6.  This  is  a  convex  optimization  prob¬ 
lem  since  K(0)  is  convex  by  Holder’s  inequality  [9]. 
Hence  many  convex  optimization  techniques  can  be  uti¬ 
lized  [10,  11].  By  taking  the  derivative  with  respect  to 
6,  the  MLE  of  0  is  found  by  solving 


t  = 


dK{6) 

80 


(7) 


Also  because  K(0 )  is  convex,  the  MLE  0  is  unique. 
Then  we  decide  H\  if 

111  = ~ K(i>)  >  T  (8) 
where  r  is  a  threshold. 


5  Examples 

In  this  section,  we  will  apply  the  the  constructed  PDF 
of  (5)  to  some  detection  problems.  We  will  start  with 
the  simple  case  with  Gaussian  noise,  and  then  we  will 
extend  the  result  to  the  more  general  case  with  Gaus¬ 
sian  mixture  noise. 

5.1  Partially  Observed  Linear  Model 
with  Gaussian  Noise 

Suppose  we  have  the  linear  model  with 

x  =  Hq  +  w  (9) 

with 


Ho  '■  o  =  0 
Hi:  a/0 

where  x  is  an  N  x  1  vector  of  the  underlying  unobserv¬ 
able  samples,  H  is  an  N  x  p  observation  matrix  with 
full  column  rank,  a  is  an  p  x  1  vector  of  the  unknown 
signal  amplitudes,  and  w  is  an  N  x  1  vector  of  white 
Gaussian  noise  with  known  variance  cr2.  We  observe 
two  sensor  outputs 


Tr(x)  =Hfx 

T2(x)=H^x  (10) 

where  Ti  and  T2  could  be  any  subset  of  columns  of 
H.  Note  that  [Hl7H2]  does  not  have  to  be  H.  This 
model  is  called  a  partially  observed  linear  model.  Note 
that  a  sufficient  statistic  is  HTx,  so  there  is  some  in¬ 
formation  loss  over  the  case  when  x  is  observed,  unless 
H  =  [H1;H2]. 

Let  G  =  [Hi,  H2],  then  we  have 


4 


KL  Divergence  Between  The 
True  PDF  and  The  Con¬ 
structed  PDF 


'  Tr(x)  ' 

Hfx 

.  T2«  . 

H^x 

Gtj 


Therefore,  T  is  also  Gaussian  with  PDF 


(11) 


The  KL  divergence  is  a  non-symmetric  measure  of  dif¬ 
ference  between  two  PDFs.  For  two  PDFs  p±  and  po, 
it  is  defined  as 

D(j>i\\po)  =  j  Pi(x)  ln^lj^ydx 

It  is  well  known  that  D  {pi  ||po)  >  0  with  equality  if 
and  only  if  pi  =  p0  [12].  By  Stein’s  lemma  [13],  the 
KL  divergence  measures  the  asymptotic  performance 
for  detection. 

It  can  be  shown  that  pr(t;0)  is  the  optimal  under 
both  hypotheses.  That  is,  if  it  is  under  Ho,  PT(t;  0)  = 
Pt( t;  0)  asymptotically,  and  if  it  is  under  Hi,  px(t;  0) 
is  asymptotically  the  closest  to  the  true  PDF  in  KL 
divergence.  Similar  results  and  arguments  have  been 
shown  in  [6,  14]. 


T  ~  Af  (0,  er2GTG)  under  Ho 

and  Ti,  T2  are  seen  to  be  correlated  for  H^H2  ^  0. 
As  a  result,  we  construct  the  PDF  as  in  (5)  with 


K{0)  =  In  E0 


^a20TGTG0  (12) 


Note  that  0  is  the  vector  of  the  unknown  parameters 
in  the  constructed  PDF,  and  it  is  different  from  the 
unknown  parameters  a  in  the  linear  model. 

By  (7)  and  (12),  the  MLE  of  0  satisfies 


t  = 


dE  ( 0 ) _ 2  f-iTf 


So 


dO 

1 


=  a  G  GO 


0=^(G-Gpt 


and  the  test  statistic  becomes 


and  the  GLRT  statistic  becomes 


0Tt-K{6)  =  ^tT(GTG)  \  (13) 

Next  we  consider  the  clairvoyant  GLRT.  That  is  the 
GLRT  when  we  know  the  true  PDF  of  T  under  Hi 
except  for  the  underlying  unknown  parameters  a.  From 
(11)  we  know  that 

T  ~  Af  (GrHa,  ct2GtG)  under  Hi 

We  write  the  true  PDF  under  Hi  as  px(t;c*).  The 
MLE  of  a  is  found  by  maximizing 

pT(t;a) 

in - 7 - - 

pT(t;0) 

=  (t  -  GTHa)T  (GtG)_1  (t  -  GTHa) 

+  ^2tT(GTG)  ^ 

Let  t  be  q  x  1.  If  q  <  p,  i.e. ,  the  length  of  t  is  less  than 
the  length  of  a,  then  the  MLE  a  may  not  be  unique. 
Since  (t  -  GTHa)T  (GtG)_1  (t  -  GTHa)  >  0,  we 
could  always  find  a  such  that  t  =  GTHa  and  hence 
(t  -  GTHa)T  (GtG)_1  (t  -  GTHa)  =  0.  Hence  the 
clairvoyant  GLRT  statistic  becomes 


Pr(t;oO 
In - t - - 

pt{  t;0) 


(GTG)_1t 


which  is  the  same  as  the  GLRT  on  our  constructed  PDF 
(see  (13))  when  q  <  p. 


5.2  Partially  Observed  Linear  Model 
with  Non-Gaussian  Noise 

The  partially  observed  linear  model  remains  the  same 
as  in  the  previous  subsection  except  instead  of  assuming 
that  w  is  white  Gaussian,  we  will  assume  that  w  has 
a  Gaussian  mixture  distribution  with  two  components, 
i.e., 

w  ~  nj\f(0,  er2I)  +  (1  —  n)Af(0,  a2I)  (14) 

where  tt,  ct2  and  a\  are  known  (0  <  7r  <  1).  The 
following  derivation  can  be  easily  extended  when  w  ~ 
Ef=i^(0,a|l). 

Since  w  has  a  Gaussian  mixture  distribution,  T  = 
Gt  x  is  also  Gaussian  mixture  distributed  and 


T  -  7rAA(0,cr2GTG)  +  (l-7r)AA(0,(r2GTG)  under  H0 
It  can  be  shown  that  the  GLRT  statistic  is 
eTt  -  In 


max 

6 


+  (1  -  7r)e^TGTG9 


(15) 

Although  no  analytical  solution  of  the  MLE  of  9  ex¬ 
ists,  it  can  be  found  using  convex  optimization  tech¬ 
niques  [10,  11].  Moreover,  an  analytical  solution  exists 
as  II 011  —>  0.  It  can  be  shown  that 


9 


1 


7T &i  +  (1  —  7 t)ct| 


(gtg) 


(16) 


2(Wl  +  (1  -  7t)<7§) 


t1  (GJ  G 


-1 


(17) 


as  ||0||  — >  0. 

The  clairvoyant  GLRT  statistic  can  be  shown  to  be 
equivalent  to 

tT(GTG)'1t  (18) 

when  q  <  p.  Hence  the  clairvoyant  GLRT  coincides 
with  the  GLRT  using  the  constructed  PDF  as  ||0||  — >  0. 

Note  that  the  noise  in  (14)  is  uncorrelated  but  not 
independent.  We  consider  a  general  case  when  the  noise 
can  be  correlated  with  PDF 


^(0,00 +  (1-70^(0,02)  (19) 


It  can  be  shown  that  for  the  GLRT  using  the  con¬ 
structed  PDF,  the  test  statistic  is 


max 

9 


0Tt-ln(7re50TGTClG0  +  (l-7r)e5 


and  the  clairvoyant  GLRT  statistic  is 
—  In 


9tgtc2g9 

(20) 


ylet1/2  (CO 
1  —  7 r 


clet1/2  (C2) 


exp 


exp 


-itT(GTC!G)  \ 


-^tT(GTC2G)  \ 


(21) 


when  q  <  P- 

6  Simulations 

Since  the  GLRT  using  the  constructed  PDF  coincides 
with  the  clairvoyant  GLRT  under  Gaussian  noise  as 
shown  in  subsection  5.1,  we  will  only  compare  the 
performances  under  non-Gaussian  noise  (both  uncor¬ 
related  noise  as  in  (14)  and  correlated  noise  as  in  (19)). 
Consider  the  model  where 

x[n]  =  A\  +  A2rn  +  A3  cos(27r  fn  +  (/))  +  w[n)  (22) 

for  n  =  0,1,...,  —  1  with  known  r  and  frequency  / 

but  unknown  amplitudes  A\ ,  A2,  A3  and  phase  (f>.  This 
model  as  in  (9)  where 

11  0 
r  cos(27 r/)  sin(27r/) 

rN~x  cos(27r/(Ar  —  1))  sin(27r  f{N  —  1)) 

and  a  =  [A\,  A2,  A3  cos  (j),  —  A3  sin  (j)]T . 

Let  w  have  an  uncorrelated  Gaussian  mixture  dis¬ 
tribution  as  in  (14).  For  the  partially  observed  linear 
model,  we  observe  two  sensor  outputs  as  in  (10).  We 
compare  the  GLRT  in  (15)  with  the  clairvoyant  GLRT 


is  a  linear 


H  = 


in  (18).  Note  that  the  MLE  of  6  in  (15)  is  found  nu¬ 
merically,  not  by  the  asymptotic  approximation  in  (16). 
In  the  simulation,  we  use  N  =  20,  A\  =  2,  A2  =  3, 
A3  =  4,  (j)  =  tt/4,  r  =  0.95,  /  =  0.34,  tt  =  0.9,  <r \  =  50, 
a\  =  500,  and  Hi  and  H2  are  the  first  and  third 
columns  in  H  respectively,  i.e.,  Hi  =  [1, 1, . . . ,  1]T, 
H2  =  [1,cos(27 r/), . . .  ,cos(27r/(7V  —  1))]T.  As  shown  in 
Figure  2,  the  performances  are  almost  the  same  which 
justifies  their  equivalence  under  small  signals  assump¬ 
tion  shown  in  Section  5. 


Probability  of  False  Alarm 

Figure  2:  ROC  curves  for  the  GLRT  using  the  con¬ 
structed  PDF  and  the  clairvoyant  GLRT  with  uncorre¬ 
lated  Gaussian  mixture  noise. 

Next  for  the  same  model  in  (22),  let  w  have  a  cor¬ 
related  Gaussian  mixture  distribution  as  in  (14).  We 
compare  performances  of  the  GLRT  using  the  con¬ 
structed  PDF  as  in  (20)  and  the  clairvoyant  GLRT 
as  in  (21).  We  use  N  =  20,  A%  =  3,  A2  =  4, 
A3  =  3,  <t>  =  7t/7,  r  =  0.9,  /  =  0.46,  7 r  =  0.7,  Hi  = 
[1,1,...,  1]T,  H2  =  [l,cos(27r/),...,cos(27r/(A^-l))]T. 
The  covariance  matrices  Ci,  C2  are  generated  using 
Ci  =  Rf  x  Ri,  C2  =  R^  x  R2,  where  Ri,  R2  are 
full  rank  N  x  N  matrices.  As  shown  in  Figure  3,  the 
performances  are  still  very  similar. 


7  Conclusions 

A  novel  method  of  combining  sensor  outputs  for  de¬ 
tection  based  on  the  exponential  family  has  been  pro¬ 
posed.  It  does  not  require  the  joint  PDF  under  Hi.  The 
constructed  PDF  has  been  shown  to  be  optimal  in  KL 
divergence.  The  GLRT  statistic  based  on  this  method 
can  be  shown  to  be  equivalent  to  the  clairvoyant  GLRT 
statistic  for  the  partially  observed  linear  model  with 
both  Gaussian  or  non-Gaussian  noise.  The  equivalence 
is  also  shown  in  simulations. 


Probability  of  False  Alarm 

Figure  3:  ROC  curves  for  the  GLRT  using  the  con¬ 
structed  PDF  and  the  clairvoyant  GLRT  with  corre¬ 
lated  Gaussian  mixture  noise. 
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